73 research outputs found

    From tools and databases to clinically relevant applications in miRNA research

    Get PDF
    While especially early research focused on the small portion of the human genome that encodes proteins, it became apparent that molecules responsible for many key functions were also encoded in the remaining regions. Originally, non-coding RNAs, i.e., molecules that are not translated into proteins, were thought to be composed of only two classes (ribosomal RNAs and transfer RNAs). However, starting from the early 1980s many other non-coding RNA classes were discovered. In the past two decades, small non-coding RNAs (sncRNAs) and in particular microRNAs (miRNAs), have become essential molecules in biological and biomedical research. In this thesis, five aspects of miRNA research have been addressed. Starting from the development of advanced computational software to analyze miRNA data (1), an in-depth understanding of human and non-human miRNAs was generated and databases hosting this knowledge were created (2). In addition, the effects of technological advances were evaluated (3). We also contributed to the understanding on how miRNAs act in an orchestrated manner to target human genes (4). Finally, based on the insights gained from the tools and resources of the mentioned aspects we evaluated the suitability of miRNAs as biomarkers (5). With the establishment of next-generation sequencing, the primary goal of this thesis was the creation of an advanced bioinformatics analysis pipeline for high-throughput miRNA sequencing data, primarily focused on human. Consequently, miRMaster, a web-based software solution to analyze hundreds sequencing samples within few hours was implemented. The tool was implemented in a way that it could support different sequencing technologies and library preparation techniques. This flexibility allowed miRMaster to build a consequent user-base, resulting in over 120,000 processed samples and 1,5 billion processed reads, as of July 2021, and therefore laid out the basis for the second goal of this thesis. Indeed, the implementation of a feature allowing users to share their uploaded data contributed strongly to the generation of a detailed annotation of the human small non-coding transcriptome. This annotation was integrated into a new miRNA database, miRCarta, modelling thousands of miRNA candidates and corresponding read expression profiles. A subset of these candidates was then evaluated in the context of different diseases and validated. The thereby gained knowledge was subsequently used to validate additional miRNA candidates and to generate an estimate of the number of miRNAs in human. The large collection of samples, gathered over many years with miRMaster was also integrated into a web server evaluating miRNA arm shifts and switches, miRSwitch. Finally, we published an updated version of miRMaster, expanding its scope to other species and adding additional downstream analysis capabilities. The second goal of this thesis was further pursued by investigating the distribution of miRNAs across different human tissues and body fluids, as well as the variability of miRNA profiles over the four seasons of the year. Furthermore, small non-coding RNAs in zoo animals were examined and a tissue atlas of small non-coding RNAs for mice was generated. The third goal, the assessment of technological advances, was addressed by evaluating the new combinatorial probe-anchor synthesis-based sequencing technology published by BGI, analyzing the effect of RNA integrity on sequencing data, analyzing low-input library preparation protocols, and comparing template-switch based library preparation protocols to ligation-based ones. In addition, an antibody-based labeling sequencing chemistry, CoolMPS, was investigated. Deriving an understanding of the orchestrated regulation by miRNAs, the fourth goal of this thesis, was pursued in a first step by the implementation of a web server visualizing miRNA-gene interaction networks, miRTargetLink. Subsequently, miRPathDB, a database incorporating pathways affected by miRNAs and their targets was implemented, as well as miEAA 2.0, a web server offering quick miRNA set enrichment analyses in over 130,000 categories spanning 10 different species. In addition, miRSNPdb, a database evaluating the effects of single nucleotide polymorphisms and variants in miRNAs or in their target genes was created. Finally, the fifth goal of the thesis, the evaluation of the suitability of miRNAs as biomarkers for human diseases was tackled by investigating the expression profiles of miRNAs with machine learning. An Alzheimer's disease cohort with over 400 individuals was analyzed, as well as another neurodegenerative disease cohort with multiple time points of Parkinson's disease patients and healthy controls. Furthermore, a lung cancer cohort covering 3,000 individuals was examined to evaluate the suitability of an early detection test. In addition, we evaluated the expression profile changes induced by aging on a cohort of 1,334 healthy individuals and over 3,000 diseased patients. Altogether, the herein described tools, databases and research papers present valuable advances and insights into the miRNA research field and have been used and cited by the research community over 2,000 times as of July 2021.WĂ€hrend insbesondere die frĂŒhe Genetik-Forschung sich auf den kleinen Teil des menschlichen Genoms konzentrierte, der fĂŒr Proteine kodiert, wurde deutlich, dass auch in den ĂŒbrigen Regionen MolekĂŒle kodiert werden, die fĂŒr viele wichtige Funktionen verantwortlich sind. UrsprĂŒnglich ging man davon aus, dass nicht codierende RNAs, d. h. MolekĂŒle, die nicht in Proteine ĂŒbersetzt werden, nur aus zwei Klassen bestehen (ribosomale RNAs und Transfer-RNAs). Seit den frĂŒhen 1980er Jahren wurden jedoch viele andere nicht-kodierende RNA-Klassen entdeckt. In den letzten zwei Jahrzehnten sind kleine nichtcodierende RNAs (sncRNAs) und insbesondere microRNAs (miRNAs) zu wichtigen MolekĂŒlen in der biologischen und biomedizinischen Forschung geworden. In dieser Arbeit werden fĂŒnf Aspekte der miRNA-Forschung behandelt. Ausgehend von der Entwicklung fortschrittlicher Computersoftware zur Analyse von miRNA-Daten (1) wurde ein tiefgreifendes VerstĂ€ndnis menschlicher und nicht-menschlicher miRNAs entwickelt und Datenbanken mit diesem Wissen erstellt (2). DarĂŒber hinaus wurden die Auswirkungen des technologischen Fortschritts bewertet (3). Wir haben auch dazu beigetragen, zu verstehen, wie miRNAs koordiniert agieren, um menschliche Gene zu regulieren (4). Schließlich bewerteten wir anhand der Erkenntnisse, die wir mit den Tools und Ressourcen der genannten Aspekte gewonnen hatten, die Eignung von miRNAs als Biomarker (5). Mit der Etablierung der Sequenzierung der nĂ€chsten Generation war das primĂ€re Ziel dieser Arbeit die Schaffung einer fortschrittlichen bioinformatischen Analysepipeline fĂŒr Hochdurchsatz-MiRNA-Sequenzierungsdaten, die sich in erster Linie auf den Menschen konzentriert. Daher wurde miRMaster, eine webbasierte Softwarelösung zur Analyse von Hunderten von Sequenzierproben innerhalb weniger Stunden, implementiert. Das Tool wurde so implementiert, dass es verschiedene Sequenzierungstechnologien und Bibliotheksvorbereitungstechniken unterstĂŒtzen kann. Diese FlexibilitĂ€t ermöglichte es miRMaster, eine konsequente Nutzerbasis aufzubauen, die im Juli 2021 ĂŒber 120.000 verarbeitete Proben und 1,5 Milliarden verarbeitete Reads umfasste, womit die Grundlage fĂŒr das zweite Ziel dieser Arbeit geschaffen wurde. Die Implementierung einer Funktion, die es den Nutzern ermöglicht, ihre hochgeladenen Daten mit anderen zu teilen, trug wesentlich zur Erstellung einer detaillierten Annotation des menschlichen kleinen nicht-kodierenden Transkriptoms bei. Diese Annotation wurde in eine neue miRNA-Datenbank, miRCarta, integriert, die Tausende von miRNA-Kandidaten und entsprechende Expressionsprofile abbildet. Eine Teilmenge dieser Kandidaten wurde dann im Zusammenhang mit verschiedenen Krankheiten bewertet und validiert. Die so gewonnenen Erkenntnisse wurden anschließend genutzt, um weitere miRNA-Kandidaten zu validieren und eine SchĂ€tzung der Anzahl der miRNAs im Menschen vorzunehmen. Die große Sammlung von Proben, die ĂŒber viele Jahre mit miRMaster gesammelt wurde, wurde auch in einen Webserver integriert, der miRNA-Armverschiebungen und -Wechsel auswertet, miRSwitch. Schließlich haben wir eine aktualisierte Version von miRMaster veröffentlicht, die den Anwendungsbereich auf andere Spezies ausweitet und zusĂ€tzliche Downstream-Analysefunktionen hinzufĂŒgt. Das zweite Ziel dieser Arbeit wurde weiterverfolgt, indem die Verteilung von miRNAs in verschiedenen menschlichen Geweben und KörperflĂŒssigkeiten sowie die VariabilitĂ€t der miRNA-Profile ĂŒber die vier Jahreszeiten hinweg untersucht wurde. DarĂŒber hinaus wurden kleine nichtkodierende RNAs in Zootieren untersucht und ein Gewebeatlas der kleinen nichtkodierenden RNAs fĂŒr MĂ€use erstellt. Das dritte Ziel, die EinschĂ€tzung des technologischen Fortschritts, wurde angegangen, indem die neue kombinatorische Sonden-Anker-Synthese-basierte Sequenzierungstechnologie, die vom BGI veröffentlicht wurde, bewertet wurde, die Auswirkungen der RNA-IntegritĂ€t auf die Sequenzierungsdaten analysiert wurden, Protokolle fĂŒr die Bibliotheksvorbereitung mit geringem Input analysiert wurden und Protokolle fĂŒr die Bibliotheksvorbereitung auf der Basis von Template-Switch mit solchen auf Ligationsbasis verglichen wurden. DarĂŒber hinaus wurde eine auf Antikörpern basierende Labeling-Sequenzierungschemie, CoolMPS, untersucht. Das vierte Ziel dieser Arbeit, das VerstĂ€ndnis der orchestrierten Regulation durch miRNAs, wurde in einem ersten Schritt durch die Implementierung eines Webservers zur Visualisierung von miRNA-Gen-Interaktionsnetzwerken, miRTargetLink, verfolgt. Anschließend wurde miRPathDB implementiert, eine Datenbank, die von miRNAs und ihren Zielgenen beeinflusste Pfade enthĂ€lt, sowie miEAA 2.0, ein Webserver, der schnelle miRNA-Anreicherungsanalysen in ĂŒber 130.000 Kategorien aus 10 verschiedenen Spezies bietet. DarĂŒber hinaus wurde miRSNPdb, eine Datenbank zur Bewertung der Auswirkungen von Einzelnukleotid-Polymorphismen und Varianten in miRNAs oder ihren Zielgenen, erstellt. Schließlich wurde das fĂŒnfte Ziel der Arbeit, die Bewertung der Eignung von miRNAs als Biomarker fĂŒr menschliche Krankheiten, durch die Untersuchung der Expressionsprofile von miRNAs anhand von maschinellem Lernen angegangen. Eine Alzheimer-Kohorte mit ĂŒber 400 Personen wurde analysiert, ebenso wie eine weitere neurodegenerative Krankheitskohorte mit Parkinson-Patienten an mehreren Zeitpunkten der Krankheit und gesunden Kontrollen. Außerdem wurde eine Lungenkrebskohorte mit 3.000 Personen untersucht, um die Eignung eines FrĂŒherkennungstests zu bewerten. DarĂŒber hinaus haben wir die altersbedingten VerĂ€nderungen des Expressionsprofils bei einer Kohorte von 1.334 gesunden Personen und ĂŒber 3.000 kranken Patienten untersucht. Insgesamt stellen die hier beschriebenen Tools, Datenbanken und Forschungsarbeiten wertvolle Fortschritte und Erkenntnisse auf dem Gebiet der miRNA-Forschung dar und wurden bis Juli 2021 von der Forschungsgemeinschaft ĂŒber 2.000 Mal verwendet und zitiert

    On the lifetime of bioinformatics web services

    Get PDF
    Web services are used through all disciplines in life sciences and the online landscape is growing by hundreds of novel servers annually. However, availability varies, and maintenance practices are largely inconsistent. We screened the availability of 2396 web tools published during the past 10 years. All servers were accessed over 133 days and 318 668 index files were stored in a local database. The number of accessible tools almost linearly increases in time with highest availability for 2019 and 2020 (∌90%) and lowest for tools published in 2010 (∌50%). In a 133-day test frame, 31% of tools were always working, 48.4% occasionally and 20.6% never. Consecutive downtimes were typically below 5 days with a median of 1 day, and unevenly distributed over the weekdays. A rescue experiment on 47 tools that were published from 2019 onwards but never accessible showed that 51.1% of the tools could be restored in due time. We found a positive association between the number of citations and the probability of a web server being reachable. We then determined common challenges and formulated categorical recommendations for researchers planning to develop web-based resources. As implication of our study, we propose to develop a repository for automatic API testing and sustainability indexing

    DynaVenn: web-based computation of the most significant overlap between ordered sets

    Get PDF
    Background: In many research disciplines, ordered lists are compared. One example is to compare a subset of all significant genes or proteins in a primary study to those in a replication study. Often, the top of the lists are compared using Venn diagrams, ore more precisely Euler diagrams (set diagrams showing logical relations between a finite collection of different sets). If different cohort sizes, different techniques or algorithms for evaluation were applied, a direct comparison of significant genes with a fixed threshold can however be misleading and approaches comparing lists would be more appropriate. Results: We developed DynaVenn, a web-based tool that incrementally creates all possible subsets from two or three ordered lists and computes for each combination a p-value for the overlap. Respectively, dynamic Venn diagrams are generated as graphical representations. Additionally an animation is generated showing how the most significant overlap is reached by backtracking. We demonstrate the improved performance of DynaVenn over an arbitrary cut-off approach on an Alzheimer’s Disease biomarker set. Conclusion: DynaVenn combines the calculation of the most significant overlap of different cohorts with an intuitive visualization of the results. It is freely available as a web service at http://www.ccb.uni-saarland.de/dynavenn

    PLSDB: a resource of complete bacterial plasmids

    Get PDF
    The study of bacterial isolates or communities requires the analysis of the therein included plasmids in order to provide an extensive characterization of the organisms. Plasmids harboring resistance and virulence factors are of especial interest as they contribute to the dissemination of antibiotic resistance. As the number of newly sequenced bacterial genomes is growing a comprehensive resource is required which will allow to browse and filter the available plasmids, and to perform sequence analyses. Here, we present PLSDB, a resource containing 13 789 plasmid records collected from the NCBI nucleotide database. The web server provides an interactive view of all obtained plasmids with additional meta information such as sequence characteristics, sample-related information and taxonomy. Moreover, nucleotide sequence data can be uploaded to search for short nucleotide sequences (e.g. specific genes) in the plasmids, to compare a given plasmid to the records in the collection or to determine whether a sample contains one or multiple of the known plasmids (containment analysis). The resource is freely accessible under https://ccbmicrobe.cs.uni-saarland.de/plsdb/

    A mouse tissue atlas of small noncoding RNA

    Get PDF
    Small noncoding RNAs (ncRNAs) play a vital role in a broad range of biological processes both in health and disease. A comprehensive quantitative reference of small ncRNA expression would significantly advance our understanding of ncRNA roles in shaping tissue functions. Here, we systematically profiled the levels of five ncRNA classes (microRNA [miRNA], small nucleolar RNA [snoRNA], small nuclear RNA [snRNA], small Cajal body-specific RNA [scaRNA], and transfer RNA [tRNA] fragments) across 11 mouse tissues by deep sequencing. Using 14 biological replicates spanning both sexes, we identified that ∌30% of small ncRNAs are distributed across the body in a tissue-specific manner with some also being sexually dimorphic. We found that some miRNAs are subject to “arm switching” between healthy tissues and that tRNA fragments are retained within tissues in both a gene- and a tissue-specific manner. Out of 11 profiled tissues, we confirmed that brain contains the largest number of unique small ncRNA transcripts, some of which were previously annotated while others are identified in this study. Furthermore, by combining these findings with single-cell chromatin accessibility (scATAC-seq) data, we were able to connect identified brain-specific ncRNAs with their cell types of origin. These results yield the most comprehensive characterization of specific and ubiquitous small RNAs in individual murine tissues to date, and we expect that these data will be a resource for the further identification of ncRNAs involved in tissue function in health and dysfunction in disease

    HumiR: Web Services, Tools and Databases for Exploring Human microRNA Data

    Get PDF
    For many research aspects on small non-coding RNAs, especially microRNAs, computational tools and databases are developed. This includes quantification of miRNAs, piRNAs, tRNAs and tRNA fragments, circRNAs and others. Furthermore, the prediction of new miRNAs, isomiRs, arm switch events, target and target pathway prediction and miRNA pathway enrichment are common tasks. Additionally, databases and resources containing expression profiles, e.g., from different tissues, organs or cell types, are generated. This information in turn leads to improved miRNA repositories. While most of the respective tools are implemented in a species-independent manner, we focused on tools for human small non-coding RNAs. This includes four aspects: (1) miRNA analysis tools (2) databases on miRNAs and variations thereof (3) databases on expression profiles (4) miRNA helper tools facilitating frequent tasks such as naming conversion or reporter assay design. Although dependencies between the tools exist and several tools are jointly used in studies, the interoperability is limited. We present HumiR, a joint web presence for our tools. HumiR facilitates an entry in the world of miRNA research, supports the selection of the right tool for a research task and represents the very first step towards a fully integrated knowledge-base for human small non-coding RNA research. We demonstrate the utility of HumiR by performing a very comprehensive analysis of Alzheimer’s miRNAs

    A CYPome-wide study reveals new potential players in the pathogenesis of Parkinson's disease

    Get PDF
    Genetic and environmental factors lead to the manifestation of Parkinson's disease (PD) but related mechanisms are only rudimentarily understood. Cytochromes P450 (P450s) are involved in the biotransformation of toxic compounds and in many physiological processes and thus predestinated to be involved in PD. However, so far only SNPs (single nucleotide polymorphisms) in CYP2D6 and CYP2E1 have been associated with the susceptibility of PD. Our aim was to evaluate the role of all 57 human P450s and their redox partners for the etiology and pathophysiology of PD and to identify novel potential players which may lead to the identification of new biomarkers and to a causative treatment of PD. The PPMI (Parkinson's Progression Markers Initiative) database was used to extract the gene sequences of all 57 P450s and their three redox partners to analyze the association of SNPs with the occurrence of PD. Applying statistical analyses of the data, corresponding odds ratios (OR) and confidence intervals (CI) were calculated. We identified SNPs significantly over-represented in patients with a genetic predisposition for PD (GPD patients) or in idiopathic PD (IPD patients) compared to HC (healthy controls). Xenobiotic-metabolizing P450s show a significant accumulation of SNPs in PD patients compared with HC supporting the role of toxic compounds in the pathogenesis of PD. Moreover, SNPs with high OR values (>5) in P450s catalyzing the degradation of cholesterol (CYP46A1, CY7B1, CYP39A1) indicate a prominent role of cholesterol metabolism in the brain for PD risk. Finally, P450s participating in the metabolism of eicosanoids show a strong over-representation of SNPs in PD patients underlining the effect of inflammation on the pathogenesis of PD. Also, the redox partners of P450 show SNPs with OR > 5 in PD patients. Taken together, we demonstrate that SNPs in 26 out of 57 P450s are at least 5-fold over-represented in PD patients suggesting these P450s as new potential players in the pathogenesis of PD. For the first time exceptionally high OR values (up to 12.9) were found. This will lead to deeper insight into the origin and development of PD and may be applied to develop novel strategies for a causative treatment of this disease.The work was supported by a research grant from the “Dr. Rolf M. Schwiete Stiftung” Mannheim/Germany

    Aviator: a web service for monitoring the availability of web services

    Get PDF
    With Aviator, we present a web service and repository that facilitates surveillance of online tools. Aviator consists of a user-friendly website and two modules, a literature-mining based general and a manually curated module. The general module currently checks 9417 websites twice a day with respect to their availability and stores many features (frontend and backend response time, required RAM and size of the web page, security certificates, analytic tools and trackers embedded in the webpage and others) in a data warehouse. Aviator is also equipped with an analysis functionality, for example authors can check and evaluate the availability of their own tools or those of their peers. Likewise, users can check the availability of a certain tool they intend to use in research or teaching to avoid including unstable tools. The curated section of Aviator offers additional services. We provide API snippets for common programming languages (Perl, PHP, Python, JavaScript) as well as an OpenAPI documentation for embedding in the backend of own web services for an automatic test of their function. We query the respective APIs twice a day and send automated notifications in case of an unexpected result. Naturally, the same analysis functionality as for the literature-based module is available for the curated section. Aviator can freely be used at https://www.ccb.uni-saarland.de/aviator

    Prospect and challenge of detecting dynamic gene copy number increases in stem cells by whole genome sequencing

    Get PDF
    Gene amplification is an evolutionarily well-conserved and highly efficient mechanism to increase the amount of specific proteins. In humans, gene amplification is a hallmark of cancer and has recently been found during stem cell differentiation. Amplifications in stem cells are restricted to specific tissue areas and time windows, rendering their detection difficult. Here, we report on the performance of deep WGS sequencing (average 82-fold depth of coverage) on the BGISEQ with nanoball technology to detect amplifications in human mesenchymal and neural stem cells. As reference technology, we applied arraybased comparative genomic hybridization (aCGH), fluorescence in situ hybridization (FISH), and qPCR. Using different in silico strategies for amplification detection, we analyzed the potential of WGS for amplification detection. Our results provide evidence that WGS accurately identifies changes of the copy number profiles in human stem cell differentiation. However, the identified changes are not in all cases consistent between WGS and aCGH. The results between WGS and the validation by qPCR were concordant in 83.3% of all tested 36 cases. In sum, both genome-wide techniques, aCGH and WGS, have unique advantages and specific challenges, calling for locus-specific confirmation by the low-throughput approaches qPCR or FISH

    IMOTA: an interactive multi-omics tissue atlas for the analysis of human miRNA-target interactions

    Get PDF
    Web repositories for almost all ‘omics’ types have been generated—detailing the repertoire of representatives across different tissues or cell types. A logical next step is the combination of these valuable sources. With IMOTA (interactive multi omics tissue atlas), we developed a database that includes 23 725 relations between miRNAs and 23 tissues, 310 932 relations between mRNAs and the same tissues as well as 63 043 relations between proteins and the 23 tissues in Homo sapiens. IMOTA also contains data on tissue-specific interactions, e.g. information on 331 413 miRNAs and target gene pairs that are jointly expressed in the considered tissues. By using intuitive filter and visualization techniques, it is with minimal effort possible to answer various questions. These include rather general questions but also requests specific for genes, miRNAs or proteins. An example for a general task could be ‘identify all miRNAs, genes and proteins in the lung that are highly expressed and where experimental evidence proves that the miRNAs target the genes’. An example for a specific request for a gene and a miRNA could for example be ‘In which tissues is miR-34c and its target gene BCL2 expressed?’. The IMOTA repository is freely available online at https://ccb-web.cs.uni-saarland.de/imota/
    • 

    corecore